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BACKGROUND 


Parametric cost risk is a statistical phenomena. One first assumes that the cost is 
defined by 


C = h(p ls p m ) 


where h is a function of the parameters pj, p m . 

Second, one assumes that each of the parameters is a random variable. This applies to 
a single cost estimating relationship (CER) which might be in the typical Cobb-Douglas form 


C = Pi qi 


Pm 


q m 


where the qj are the elasticities, or to the sum of n work breakdown structure (WBS) elements 
pj in the form 


C = PI + ... +p n - 


In a complete cost risk simulation the cost of each WBS element would be a function hj 
of parameters pj, ..., p m with the form 


C = hi(p) + ... + h n (p) 



where p is the vector 


PI 


P = 


LPmJ 


with components pp 

Third, one must make an assumption about the dependence of the variables within the 
variable set. One may assume that the variables are statistically independent, that the variables 
are totally dependent, or that correlation exists between selected pairs of variables. If the 
assumption of independence is made, then the distribution function F(C < constant) of the 
WBS elements becomes arbitrarily narrow as more WBS elements are added. If this were the 
case then we could converge on a point estimate by estimating at the lowest levels of an 
arbitrarily deep WBS. This is not the case in real life. If the assumption of total dependence is 
made then the widest distribution function occurs. It has one and only one width no matter 
how many samples are taken. This also is not the case in real life. In real life, correlation 
exists between selected pairs of variables. The focus of this paper is to examine key aspects of 
simulating this case. 

CORRELATION AND GEOMETRY 


Correlation is largely perceived to be a statistical phenomena. It is. But it is also a 
geometric phenomena (Herr, 1980). To see this, we must first view the data in vector form 
(Halmos, 1974). Let xj be the vector 


x i = 


Pil 

-Pin- 


where n is the number of data points selected for the parameter pp This may be viewed as a 
point in n-dimensional space (Kendall, 1961). Each dimension represents the particular 
instance of selecting a value ppj for parameter pp The transpose of the vector x j is denoted by 
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There are m vectors xj of dimension n. Take the mean of the n data points in each 
vector. Margolis (1979) shows the mean to be the orthogonal projection of the data onto the n- 
vector (1, ... ,1). This is displayed in 2 dimensions in Figure 1, that is, with 2 data points for 
the parameter p. 


[ 1 , 1 ]' 



Pi 

Figure 1: The Mean as an Orthogonal Projection 



Figure 2: Data Vectors Adjusted for the Mean. 



Denote the mean vector by (I = [|i ] , ... ,( 13 ]' where (Lt j is the mean of the ith parameter 
Pj. Let the mean vector be the tensor operating point (O'Neill, 1966) and translate the space by 
the coordinate function y jj = xjj - pj so that the mean becomes the origin. We now have 
vectors y j , y 2 , and y 3 which originate from the new origin at ji. This is shown in two 
dimensions in Figure 2. 


2 2 2 

Note that the squared length lypl = ypp + ... + yj n of the vector yj is the sum of 
squares of parameter i adjusted for the mean. Note further that the dot product of yj and yj is 

yj • yj = yj' yj = lypl lyjl cos 0 jj where 0jj is the angle between the vectors yj and yj. Consider 


the unit vectors Uj 



Then Uj • uj = cos 0jj. 


Let Y = [y j, ... , y 3 ] . Then the covariance matrix (Kendall, 1961) is 


<f> = Y' Y = 


yi 


.ymJ 


[yi 


y m ] 


yryi ••• yry m 
.ym*yi ••• ymWm. 


<6 = 


ly 1 Hy 1 Icos0 3 x ... Iyilly m leos0i m 

Jy m llyilcos0 m l ... Iymllymlcos0 m m- 


<F = 

1 

0 0 
0 : 

1 


COS 0 jj .. 

. cos 0 1 m 

1 

0 0 

0 : 
i?° 

1 


L 0 0 ly m IJ 


_cos 0 m i 

.. COS 0 mm _ 

L 0 0 ly m IJ 


<f) = M v FM 

where 


fiyi' 


M = 


0 

0 


0 

o’ 


0 ■ 

0 


ly m U 


is the magnitude matrix and 
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COS0 l 1 ... COS01 m 


Y = 

_COS0 m l ... COS0 mm _ 

is the correlation matrix. 

Since M is invertible, the correlation matrix can be found from the covariance matrix by 

where 



It is important to note that the covariance matrix <f>, in vector form, has the same 
definition as Einstein's fundamental metric tensor (Cartan, 1937) which completely determines 
the geometry of the data space (Einstein, 1916). Levi-Civita (1926) notes the relationship of 

the angles 0jj of between the basis vectors and the fundamental metric tensor <E. Although 
these concepts have existed for many years they have rarely been used, or even noted, by 
statisticians. 

What does all this mean in terms of cost risk? It means that the geometric paradigm 
provides a way of both visualizing and implementing cost risk. 

The three vectors u j , ... u m are orthogonal if and only if cos 0y = 0 for i ^ j. Thus 
correlated parameters have non orthogonal unit data vectors. A basis is a set of vectors which 
span a space or subspace such that none of them may be written as a linear combination of a 
subset of the other basis vectors (Halmos, 1974). This means that U = [u^, ... , u m ] is a 
normal but non orthogonal basis for the data space. Y = [y j, ... , y m ] is a non normal and non 
orthogonal basis for the data space. Such bases require a second tensor concept, the first being 
the establishment of the mean as the tensor operating point. 
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If w is any vector in our data space then (w • uj) uj is the projection of w on the unit basis 
vector Uj (Saville and Wood, 1991). In the terminology of tensors (w • uj) uj is the covariant 
projection of w on uj and w • uj = Iwl cos w j is the covariant component (Pellionisz and Llinas, 
1980). Unfortunately, as shown in Figure 3, the covariant components Uj a of non orthogonal 
vectors can not be used to obtain the vector sum. 



The vector sum must use the contravariant components u;P. These are obtained by the 
transformation 


u i P = 52 a u ia 


where the T'P 06 arc the components of T J . This is called "raising the index." Observe from 
Figure 3 that the covariant and contravariant components are identical when the basis vectors 
are orthogonal. This is an excellent reason for transforming the data to the basis defined by the 
principal components (Dean, 1988) which is an orthonormal basis. In that basis, the covariant 
and contravariant components are identical. 
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GENERATING CORRELATED RANDOM NUMBERS 


In order to estimate cost risk, following Book and Young (1992), a set of random 
numbers pj may be chosen from arbitrary distributions and then used to generate the estimated 
distribution of cost with desired correlation which defines the cost risk. It is desired that the 
variates pj have the correlation V E and covariance <£>. 

Choose the variates x independently from the desired distributions, and adjust them for 
the mean to obtain variates y from 


yij = x ij-Mj- 


Form the covariance matrix Y' Y with magnitude M as the desired magnitude to obtain <J>. 
Note that 


Y'Y = M v E y M = MM 


since the variates were chosen independently. Following Fukunaga (1990) we transform the 
variates by 


Z = Y M" 1 


to obtain 


Z' Z = M" 1 ' Y 1 Y M' 1 = M' 1 M M M' 1 = I. 


Thus the variates z are uncorrelated and have unit magnitude. Choosing new variates v defined 
by 


V = Z v E 1/2 


we have 


v' v = (Z 'p 1/2 y z v p^ 2 = 2' z v p^ 2 = 'p 


Finally, choosing new variates u defined by 


U = V M = Z V F 1/2 M = Y M' 1 l F 1/2 M 


we have 


U' U = (Y M' 1 V F 1/2 M )' Y M' 1 V F 1/2 M 
= M V F 1/2 M' 1 Y' Y M' 1 V F 1/2 M 

= M V F 1/2 M' 1 M M M' 1 V F 1/2 M 
= M V FM = <F. 


Thus the variates u have the desired correlation and covariance. Adjusting for the mean by 


W ij =u ij + Ki 


we obtain the desired variates wjj. 

FINDING THE SQUARE ROOT OF THE CORRELATION MATRIX 

The square root of a correlation matrix is not unique. The Choleski factorization can be 
used (Book and Young, 1992). Another technique used by the author is as follows: 

Following Dean (1988), find the principle components (Overall and Klett, 1983; Press, 
Teukolsky, Vetterling, and Flannery, 1992) of the correlation matrix. Thus we have the 

eigenvector matrix £>. and the diagonal eigenvalue matrix A such that 


v PO = OA 


and .Q .Q' = .Q 1 .Q = I 



where 


and 


A = 


L“i • 

•• %. 

i 

>> 

.. 0 

i 

O 

1 

s 


Thus 


Y = Cl A Cl' = Cl A 112 A 1/2 Cl' = Cl A 112 A 1/2, Cl' = (Cl A 1/2 ) (Cl A 1/2 )' 


Letting 


V F 1/2 = (C2 A 1/2 )' = A 1/2 Cl 


we have 


VJ/ = VJ/ 


1 / 2 , 


vj/1/2 _ vj/ 1/2 vj/ 1/2 


as desired. 

Thus the desired variates wy are 


W ij = u ij + Mj 

where 


U = Y M" 1 A 1/2 Cl' M. 
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OBSERVATIONS 


The geometric viewpoint identifies the choice of a correlation matrix for the simulation 
of cost risk with the pairwise choice of data vectors corresponding to the parameters used to 
obtain cost risk. The correlation coefficient is the cosine of the angle between the data vectors 
after translation to an origin at the mean and normalization for magnitude. Thus correlation is 
equivalent to expressing the data in terms of a non orthogonal basis. To understand the many 
resulting phenomena requires the use of the tensor concept of raising the index to transform the 
measured and observed covariant components into contravariant components before vector 
addition can be applied. 

The geometric viewpoint also demonstrates that correlation and covariance are 
geometric properties, as opposed to purely statistical properties, of the variates. Thus, variates 
from different distributions may be correlated, as desired, after selection from independent 
distributions. 

By determining the principal components of the correlation matrix, variates with the 
desired mean, magnitude, and correlation can be generated through linear transforms which 
include the eigenvalues and the eigenvectors of the correlation matrix. 

The conversion of the data to a non orthogonal basis uses a compound linear 
transformation which distorts or stretches the data space. Hence, the correlated data does not 
have the same properties as the uncorrelated data used to generate it. This phenomena is 
responsible for seemingly strange observations such as the fact that the marginal distributions 
of the correlated data can be quite different from the distributions used to generate the data. 

The joint effect of statistical distributions and correlation remains a fertile area for further 
research. 

In terms of application to cost estimating, the geometric approach demonstrates that the 
estimator must have data and must understand that data in order to properly choose the 
correlation matrix appropriate for a given estimate. 

There is a general feeling by employers and managers that the field of cost requires little 
technical or mathematical background. Contrary to that opinion, this paper demonstrates that a 
background in mathematics equivalent to that needed for typical engineering and scientific 
disciplines at the masters or doctorate level is appropriate within the field of cost risk. 
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